[grpo] Optimize vLLM weight synchronization & update buitin accuracy reward #5773

hjh0119 · 2025-09-11T09:13:45Z

Optimize weight synchronization between the training model and the inference engine (vLLM):

LoRA

Synchronize/load only the trained adapter weights.(both colocate / server mode), for server mode, use --vllm_enable_lora true in rollout
In server mode, transmit flattened adapter weights to reduce communication overhead of model parameters.

FULL

Remove the original per-tensor synchronization logic and adopt a bucketing strategy to reduce redundant communication requests and overhead, especially for MoE models (which have more tensors than dense models).‘
Removed per-tensor gather and reverted to using batched gather; as a result, move_model_batches now works with full-parameter training.

Update: Built-in Accuracy Reward

Fixed several cases where accuracy could not be correctly evaluated
Added corresponding unit tests.

hjh0119 · 2025-09-11T09:30:18Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces an optimization for LoRA training with vLLM by enabling in-memory weight synchronization using flattened tensors. This avoids disk I/O and should improve training speed. The changes involve adding new arguments, new protocol definitions, and new methods in the rollout engine and GRPO trainer. A key part of the implementation is monkey-patching vLLM to support loading LoRA adapters from tensors. The overall approach is sound, but there are a few areas that need attention, such as ensuring deterministic adapter selection, cleaning up commented-out code, and addressing TODO comments.

swift/llm/infer/infer_engine/grpo_vllm_engine.py

swift/llm/infer/rollout.py

swift/trainers/rlhf_trainer/grpo_trainer.py

hjh0119 · 2025-09-12T09:08:32Z

Qwen2.5-VL-7B-Instruct, server mode, tp=2, dp=2 → 10× speed-up

hjh0119 · 2025-10-13T09:49:42Z

Qwen2.5-VL-32B-Instruct(LoRA Training) Reduced from 4s to 1s.

lora+ indicates that only the LoRA adapter weights are synchronized, and the parameter --vllm_enable_lora true is used on the rollout side.

hjh0119 · 2025-10-17T03:10:54Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces significant optimizations for weight synchronization in GRPO training with vLLM, alongside a fix for the accuracy reward calculation. For LoRA training, it cleverly syncs only the flattened adapter weights to reduce communication overhead, using a monkey-patch to allow vLLM to accept in-memory tensors. For full-parameter training, it implements a bucketing strategy to group tensors and minimize synchronization calls, which is especially beneficial for MoE models. The changes are extensive but well-structured, with corresponding updates to documentation and example scripts. I've identified one high-severity issue related to a mismatch in LoRA adapter naming that could prevent the adapter from being correctly applied during inference. Overall, this is a very strong contribution that should yield substantial performance improvements.

gemini-code-assist · 2025-10-17T03:13:42Z

swift/llm/infer/infer_engine/grpo_vllm_engine.py

+            if lora_int_ids:
+                # since max_lora = 1, pick the first lora
+                adapter_request = LoRARequest(
+                    lora_name=f'lora_{lora_int_ids[0]}',


There appears to be a mismatch in how the lora_name is constructed here compared to when the adapter is added. When adding the adapter (in both colocate and server modes), the lora_name is set to the string representation of the integer ID (e.g., f'{lora_int_id}'). However, here you are prefixing it with 'lora_'. This will likely cause vLLM to fail to find the requested adapter during inference. To fix this, the prefix should be removed to ensure consistency.

Suggested change

lora_name=f'lora_{lora_int_ids[0]}',

lora_name=str(lora_int_ids[0]),

hjh0119 · 2025-10-17T05:54:33Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces significant optimizations for weight synchronization with vLLM, for both LoRA and full-parameter training. For LoRA, it enables syncing only adapter weights and uses flattened tensors to reduce overhead. For full training, it implements a bucketing strategy for more efficient weight transfer. The built-in accuracy reward function has also been improved to handle more cases correctly, and unit tests have been added.

My review focuses on ensuring the new synchronization logic is consistent and robust. I've suggested a refactoring to unify the parameter bucketing logic for full-parameter training, making it more maintainable and correct. I also pointed out a minor inconsistency in the documentation between the English and Chinese versions. Overall, these are great improvements to the training efficiency and correctness.

swift/trainers/rlhf_trainer/grpo_trainer.py

docs/source_en/Instruction/Command-line-parameters.md

hjh0119 · 2025-10-17T07:01:58Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces significant optimizations for vLLM weight synchronization in GRPO training, covering both LoRA and full-parameter scenarios. The use of flattened tensors and a bucketing strategy to reduce communication overhead is a solid improvement. Additionally, the built-in accuracy reward mechanism has been updated to handle more cases correctly, and new unit tests have been added to ensure its robustness. The documentation is also thoroughly updated to reflect these new features. The code is well-structured, particularly the refactoring of the weight synchronization logic. I have a few minor suggestions to enhance code consistency and maintainability.

gemini-code-assist · 2025-10-17T07:03:50Z

swift/llm/infer/rollout.py

+        """
+        Receives updated weights from the client process and updates the named parameter in the model.
+        """


The docstring here is slightly misleading. It states that the method "updates the named parameter in the model," but it actually receives and applies a full flattened LoRA adapter. A more precise description would improve clarity.

Suggested change

"""

Receives updated weights from the client process and updates the named parameter in the model.

"""

"""

Receives and applies a flattened LoRA adapter to the model.

"""

gemini-code-assist · 2025-10-17T07:03:50Z

swift/trainers/rlhf_trainer/grpo_trainer.py

+                    state_dict = prefix_removed if self.rollout_enable_lora else {
+                        k.replace('.base_layer', ''): v
+                        for k, v in prefix_removed.items()
                    }


There's an inconsistency in how .base_layer is removed from parameter names. Here, k.replace('.base_layer', '') is used, while the split_batches method uses re.sub(r'\.base_layer\.', '.', name). Using re.sub is more robust as it specifically targets .base_layer. as a whole component in the name, preventing accidental replacements if .base_layer appears as part of another name. Using re.sub here would improve consistency and robustness.

Suggested change

state_dict = prefix_removed if self.rollout_enable_lora else {

k.replace('.base_layer', ''): v

for k, v in prefix_removed.items()

}

state_dict = prefix_removed if self.rollout_enable_lora else {

re.sub(r'\.base_layer\.', '.', k): v

for k, v in prefix_removed.items()

}

gemini-code-assist · 2025-10-17T07:03:50Z

swift/trainers/rlhf_trainer/utils.py

+    return gather_if_zero3
+
+
+def patch_vllm_load_adapter():


This function monkey-patches vLLM's _load_adapter to support loading LoRA adapters from in-memory tensors. While this is a clever solution, monkey-patching can be fragile and might break with future updates to the vLLM library. To improve maintainability, it would be beneficial to add a comment in the docstring specifying which version(s) of vLLM this patch is compatible with. This will make it easier to track and update when vLLM is upgraded.

hjh0119 added 14 commits September 8, 2025 15:39

support only sync lora weight

6301476

fix wip

c7be012

wip

22042fc

fix colocate lora

1081caa

add lora for server wip

4c04d36

Merge branch 'lora+' of github.com:hjh0119/swift into lora+

1982e9e

fix import

5fe3690

update extension path

161dac8

override enable_lora for rollout

0a14d20

Merge branch 'lora+' of github.com:hjh0119/swift into lora+

4574665

catch rollout exception

efae3b2

fix lora request

f454598

server wip

d46bc1f

server add_lora wip

986ac8d

hjh0119 changed the title ~~[grpo] Optimize LoRA training vLLM weight synchronization~~ [WIP] Optimize LoRA training vLLM weight synchronization Sep 11, 2025

hjh0119 marked this pull request as ready for review September 11, 2025 09:27

gemini-code-assist bot reviewed Sep 11, 2025

View reviewed changes

swift/llm/infer/infer_engine/grpo_vllm_engine.py Show resolved Hide resolved

swift/llm/infer/rollout.py Outdated Show resolved Hide resolved

swift/llm/infer/rollout.py Outdated Show resolved Hide resolved

swift/trainers/rlhf_trainer/grpo_trainer.py Outdated Show resolved Hide resolved

hjh0119 added 8 commits September 12, 2025 14:51

fix server tp

0dc8c6e

merge main

ba284ba

Merge branch 'lora+' of github.com:hjh0119/swift into lora+

0f7ca2a

doc wip

849696f

doc

f70e827

check lora

274f6db

support only sync lora weight

f0b4de8

add args for lora script

6069888

hjh0119 changed the title ~~[WIP] Optimize LoRA training vLLM weight synchronization~~ Optimize LoRA training vLLM weight synchronization Sep 12, 2025

update script

0cf5a62

fix

691a5df

hjh0119 added 2 commits October 10, 2025 16:01

rm comments

8c73590

Merge branch 'lora+' of github.com:hjh0119/swift into lora+

4389268

hjh0119 mentioned this pull request Oct 11, 2025

rlhf，grpo，qwen30B， Stuck in the initial stage of training #5949

Open

hjh0119 added 2 commits October 13, 2025 15:36

move model batches for full parameters

cc56588

Merge branch 'lora+' of github.com:hjh0119/swift into lora+

1360752

hjh0119 added 7 commits October 13, 2025 17:52

fix lora training with rollout enable_lora

1561e8f

Merge remote-tracking branch 'origin' into lora+

0f81605

check should merge adapter

24220bf

update accuracy & test accuracy & moe script

c66a801

add test cases

190c201

colocate moe script & fix moe colocate lora training

c186507

add script

b9a1827

gemini-code-assist bot reviewed Oct 17, 2025

View reviewed changes

doc update

eb3e230

hjh0119 changed the title ~~[grpo] Optimize vLLM weight synchronization~~ [grpo] Optimize vLLM weight synchronization & update buitin accuracy reward Oct 17, 2025

Jintao-Huang approved these changes Oct 17, 2025

View reviewed changes

rm sricpt & update doc

9b503a7

gemini-code-assist bot reviewed Oct 17, 2025

View reviewed changes

swift/trainers/rlhf_trainer/grpo_trainer.py Outdated Show resolved Hide resolved

docs/source_en/Instruction/Command-line-parameters.md Show resolved Hide resolved

streamline weight sync

67f36a6

gemini-code-assist bot reviewed Oct 17, 2025

View reviewed changes

hjh0119 added 3 commits October 17, 2025 15:11

clean comments and import

a406f6d

add vllm_enable_lora for colocate

0380c08

fix zh comment in en doc

9f98473

hjh0119 merged commit 8eda1d3 into modelscope:main Oct 17, 2025
1 of 2 checks passed

hjh0119 deleted the lora+ branch October 17, 2025 09:13

hjh0119 mentioned this pull request Oct 17, 2025

GRPO Training on Qwen3-30B-A3B Stuck #6029

Closed

	lora_name=f'lora_{lora_int_ids[0]}',
	lora_name=str(lora_int_ids[0]),

Uh oh!

[grpo] Optimize vLLM weight synchronization & update buitin accuracy reward #5773

[grpo] Optimize vLLM weight synchronization & update buitin accuracy reward #5773

Uh oh!

Conversation

hjh0119 commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Optimize weight synchronization between the training model and the inference engine (vLLM):

LoRA

FULL

Update: Built-in Accuracy Reward

Uh oh!

hjh0119 commented Sep 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hjh0119 commented Sep 12, 2025

Uh oh!

hjh0119 commented Oct 13, 2025

Uh oh!

hjh0119 commented Oct 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

hjh0119 commented Oct 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

hjh0119 commented Oct 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hjh0119 commented Sep 11, 2025 •

edited

Loading